ODGOMS - results for OAEI 2013

نویسندگان

  • I-Hong Kuo
  • Tai-Ting Wu
چکیده

ODGOMS is a multi-strategy ontology matching system which consists of elemental level, structural level, and optimization level strategies. When it starts to match ontologies, it first exploits appropriate string-based and token-based similarity computing strategies to find preliminary aligned results, and then it filters these results and merges them by using the optimization strategies. Despite ODGOMS uses simple matching logic, the results show that it is competitive with other well known ontology matching tools. 1 Presentation of the system 1.1 State, purpose, general statement ODGOMS (Open Data Group Ontology Matching System) is an ontology matching system exploited by our looking forward research plan in the company. The target of mentioned above research plan is to offer people an user-friendly integrated interface to search and to browse linked open data on the internet. The main idea of ODGOMS is to exploit simple but useful matching and merging strategies to produce robust aligned results. All strategies used in the system can be grouped into three groups that are elemental level strategies, structural level strategies, and optimization level strategies. We have submitted two versions of ODGOMS which are version 1.1 and version 1.2 to participate in OAEI 2013 campaign. Of the two the latter is better than the former. This is because the latter has fixed some bugs existed in the former and has added some new features. Since ODGOMS version 1.2 is the latest version of the system, we only describe the contents of it in the following sections. 1.2 Specific techniques used ODGOMS focuses on developing individual ontology matching modules for different matching aspects and on finding an appropriate way to merge all matching modules. * Supported by the looking-forward research plan in Industrial Technology Research Institute. The mentioned above plan is named "Data Refining for LOD Using Linked Data Integration Technology." Each matching module of ODGOMS can be exploited individually by setting filter threshold and the positions of input ontologies. The system architecture of ODGOMS is shown in Fig. 1. Fig. 1. System architecture of ODGOMS The workflow of ODGOMS shown in Fig. 1 is described as follows. It first reads input ontologies into the memory, then it runs all matching modules individually which are LabelMatcher, IDMatcher, LCSMatcher, SMOAMatcher, PurityMatcher, TFIDFMatcher, NETMatcher, PBCTMatcher, and PBCSMatcher. After that it uses a filtering module named ThresholdFilter to filter all aligned results stored in each matching module, and merges them in an special order by exploiting an optimizating module named AlignmentMerger. At last, it outputs the integrated aligned results. All modules are divided into three groups which are elemental level modules, structural level modules, and optimization level modules. The detailed description of mentioned above modules are described as follows. 1.2.1 Elemental Level Modules LabelMatcher For each entity in the first input ontology, this module finds a best matched entity in the second input ontology that has at least one common label (e.g. rdfs:label), and stores them as aligned results. Please note that it deletes non-English and non-Numeric characters from the labels of input entities and transforms the labels into lowercase characters before it starts to match entities. IDMatcher The matching procedure of this module is the same as that of LabelMatcher, except that it finds a best matched entity in the second input ontology for each entity in the first input ontology that has identical ID (e.g. rdf:ID). LCSMatcher It finds a best matched entity with highest LCS [5] (Longest Common Subsequence) similarity in the second ontology for each entity in the first input ontology and stores them as aligned results. When it computes the LCS similarity of two input entities, it first delete non-English and non-Numeric characters from all labels (e.g. rdf:ID, rdfs:label, rdfs:comment) of the input entities. Then it computes the LCS similarities of each pair of labels between the input entities and considers the highest similarity as the final similarity of the input two entities. The LCS similarity of two input labels can be computed using the following equation: In above equation, A and B mean the input labels, function LCSlen(A,B) returns the length of longest common subsequence between A and B, and functions Length(A) and Length(B) returns the lengths of A and B respectively. SMOAMatcher The matching procedure of this module is the same as that of LCSMatcher, except that it replaces the LCS similarity computing scheme with the SMOA [4] similarity computing scheme. PurityMatcher The matching procedure of this module is similar to that of LabelMatcher and IDMatcher, except that it deletes all useless English stopwords (such as words “has”) of all labels within the classes and properties in the input ontologies before it starts to match ontologies. It can find interesting aligned results such as the mapping of labels “has_an_Email” versus “email”. TFIDFMatcher This module matches only classes from different input ontologies based on the TF-IDF [1] Cosine similarity [2] computing scheme. The idea of exploiting text-mining techniques (such as TF-IDF representation) in the system is inspired by YAM++ version 2012 [6]. The matching procedure of this module is described as follows. For each class in the first input ontology, it computes the TFIDF Cosine similarities of the class and all classes in the second ontology. Then it chooses the best matched class with highest similarity in the second ontology, and stores them as aligned results. When it tries to compute the TF-IDF Cosine similarity of two input classes, it first splits the all labels (e.g. rdf:ID and rdfs:label) of input classes into two English token sets, and then it computes the TF-IDF values of each token within the two token sets respectively. Please note that the TF value of a token means the frequency of this token appears in the token set, and the IDF value of a token means the inverted frequency of this token appears in all token sets that all classes hold in the ontology. After that, it normalizes the TF-IDF values of two token sets, considers them as two normalized TF-IDF vectors, and finally computes the Cosine similarity of these two TF-IDF vectors. NETMatcher It finds a best matched class with highest NET (named-entity transformation) similarity in the second ontology for each class in the first input ontology and stores them as aligned results. When it tries to compute the NET similarity of two input classes, it first deletes non-English and non-Numeric characters of all labels (e.g. rdf:ID and rdfs:label) of input classes and splits them into tokens. Please note that if there are n tokens and n is no less than 2, then at least n-1 tokens leads by capital English character or numeric character. Then it computes the input classes' NET similarity using the following equation: In above equation, A and B mean the token sets belong to different input classes, function commonTokens returns the total common tokens of input token sets, function commonPrefix returns the average of total common prefix characters versus total characters of all tokens within different input token sets. This module can find interesting aligned results such as the mappings of tokens “OWL” versus “Web Ontology Language” or “PCMembers” versus “Program Community Members”, etc. 1.2.2. Structural Level Modules There are two structural level matching modules, PBCTMatcher and PBCSMatcher in the system now. The former computes classes' integrated similarities using tokenbased computing scheme and the latter computes them using string-based ones. The ideas of the above matching modules are derived from the matcher NameAndPropertyAlignment of Alignment API 4.5 [3]. PBCTMatcher The full name of it is Property-based Class Token Matcher. For each class in the first input ontology, it finds a best matched class with highest integrated similarity in the second input ontology. It computes input classes' integrated similarities by combining the input classes' similarities and their properties' similarities using the following equation: In the above equation, CStfidf means the TF-IDF Cosine similarity between the input classes, and PStfidf means the TF-IDF Cosine similarity between the belonged properties of input classes. The computing procedure of TF-IDF Cosine similarity is the same as that of TFIDFMatcher. PBCSMatcher The full name of it is Property-based Class String Matcher. It's like PBCTMatcher, except it computes input classes' integrated similarities using LCS (Longest Common Subsequence) similarity computing scheme rather than using TFIDF similarity computing scheme in PBCTMatcher. 1.2.3. Optimization Level Modules ThresholdFilter It filters the stored aligned results in each matching module according to the default filter threshold, respectively. Each aligned result whose similarity is lower than the specified filter threshold is deleted from the original matching module. AlignmentMerger It merges all stored aligned results of each matching module by a special order. The merging type of AlignmentMerger is called Absorb. That means when it merges the aligned results of two matching modules, it preserves all aligned results of the former and filters any aligned results of the latter which is partly or completely overlapped in the former. A merging example of AlignmentMerger is given in Fig. 2. Fig. 2. A merging example of AlignmentMerger. In Fig. 2, AlignmentMerger is to merge the aligned results of the matching modules A1 and A2. Let Ci,j be the jth object in ontology i. If the aligned results in A1 are { , } and the ones in A2 are { , }. Because in A2 is partly overlapped with in A1, the merged aligned results are thus { , , }. 1.3 Adaptations made for the evaluation ODGOMS uses the same parameters to run each experiment in all tracks of OAEI 2013. The parameters are divided into two groups as follows. The first group of parameters includes the default filter thresholds used by module ThresholdFilter in the system, which are set to be 1.0 for modules LabelMatcher, IDMatcher, and SMOAMatcher, 0.87 for modules LCSMatcher, PurityMatcher, and NETMatcher, 0.8 for module PBCSMatcher, 0.781 for module TFIDFMatcher, and 0.3 for module PBCTMatcher, respectively. The second group of parameters includes the merging order used by module AlignmentMerger in the system. The mentioned above merging order is : LabelMatcher, IDMatcher, LCSMatcher, SMOAMatcher, PurityMatcher, TFIDFMatcher, NETMatcher, PBCTMatcher, and PBCSMatcher. 1.4 Link to the system and parameters file The readers can download execution files of all versions of ODGOMS from our Google SkyDrive download position1, and test them using SEALS client 4.1. Please refer to SELAS client tutorial to learn more testing examples.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Results of the Ontology Alignment Evaluation Initiative 2013

Ontology matching consists of finding correspondences between semantically related entities of two ontologies. OAEI campaigns aim at comparing ontology matching systems on precisely defined test cases. These test cases can use ontologies of different nature (from simple thesauri to expressive OWL ontologies) and use different modalities, e.g., blind evaluation, open evaluation and consensus. OA...

متن کامل

Automating OAEI Campaigns

This paper reports the first effort into integrating OAEI and SEALS evaluation campaigns. OAEI is an annual evaluation campaign for ontology matching systems. The 2010 campaign includes a new modality in coordination with the SEALS project. This project aims at providing standardized resources (software components and data sets) for automatically executing evaluations of typical semantic web to...

متن کامل

Is my ontology matching system similar to yours?

The quality of the mappings computed by an ontology matching system in the Ontology Alignment Evaluation Initiative (OAEI) [2, 1] is typically measured in terms of precision and recall with respect to a reference set of mappings. Additionally, the OAEI also evaluates the coherence of the computed mappings [1]. However, the differences and similarities among the mappings computed by different sy...

متن کامل

CroLOM results for OAEI 2017: summary of cross-lingual ontology matching systems results at OAEI

This paper presents the results obtained in the OAEI 2017 campaign by our ontology matching system CroLOM. CroLOM is an automatic system especially designed for aligning multilingual ontologies. This is our second participation with CroLOM in the OAEI and the results have so far been positive.

متن کامل

Mix'n'Match: iteratively combining ontology matchers in an anytime fashion

We present a novel architecture for combining off-the-shelf ontology matchers based on iterative calls and exchanging information in the form of reference alignments. Unfortunately though, only a few of the matchers contesting in the past years’ OAEI campaigns actually allow the provision of reference alignments in the standard OAEI alignment format to support such a combined matching process. ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013